Search CORE

28 research outputs found

Efficient two-sample functional estimation and the super-oracle phenomenon

Author: Berrett Thomas B.
Samworth Richard J.
Publication venue
Publication date: 18/04/2019
Field of study

We consider the estimation of two-sample integral functionals, of the type that occur naturally, for example, when the object of interest is a divergence between unknown probability densities. Our first main result is that, in wide generality, a weighted nearest neighbour estimator is efficient, in the sense of achieving the local asymptotic minimax lower bound. Moreover, we also prove a corresponding central limit theorem, which facilitates the construction of asymptotically valid confidence intervals for the functional, having asymptotically minimal width. One interesting consequence of our results is the discovery that, for certain functionals, the worst-case performance of our estimator may improve on that of the natural `oracle' estimator, which is given access to the values of the unknown densities at the observations.Comment: 82 page

arXiv.org e-Print Archive

Recommended from our members

USP: an independence test that improves on Pearson's chi-squared and the G-test.

Author: Berrett Thomas B
Samworth Richard J
Publication venue: Proc Math Phys Eng Sci
Publication date: 19/03/2022
Field of study

We present the U -statistic permutation (USP) test of independence in the context of discrete data displayed in a contingency table. Either Pearson's χ 2 -test of independence, or the G -test, are typically used for this task, but we argue that these tests have serious deficiencies, both in terms of their inability to control the size of the test, and their power properties. By contrast, the USP test is guaranteed to control the size of the test at the nominal level for all sample sizes, has no issues with small (or zero) cell counts, and is able to detect distributions that violate independence in only a minimal way. The test statistic is derived from a U -statistic estimator of a natural population measure of dependence, and we prove that this is the unique minimum variance unbiased estimator of this population quantity. The practical utility of the USP test is demonstrated on both simulated data, where its power can be dramatically greater than those of Pearson's test, the G -test and Fisher's exact test, and on real data. The USP test is implemented in the R package USP

Apollo (Cambridge)

USP: an independence test that improves on Pearson's chi-squared and the G-test.

Author: Berrett Thomas B
Samworth Richard J
Publication venue: Proc Math Phys Eng Sci
Publication date: 26/01/2021
Field of study

arXiv.org e-Print Archive

PubMed Central

Apollo (Cambridge)

EFFICIENT MULTIVARIATE ENTROPY ESTIMATION VIA k-NEAREST NEIGHBOUR DISTANCES

Author: Berrett Thomas B
Samworth Richard J
Yuan Ming
Publication venue: ANNALS OF STATISTICS
Publication date: 01/01/2019
Field of study

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko (1987), based on the

k

-nearest neighbour distances of a sample of

n

independent and identically distributed random vectors in

\mathbb{R}^d

. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when

d \leq 3

. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width

Warwick Research Archives Portal Repository

Apollo (Cambridge)

Optimal rates for independence testing via U-statistic permutation tests

Author: Berrett Thomas B.
Kontoyiannis Ioannis
Samworth Richard J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 06/11/2020
Field of study

We study the problem of independence testing given independent and identically distributed pairs taking values in a

\sigma

-finite, separable measure space. Defining a natural measure of dependence

D(f)

as the squared

L^2

-distance between a joint density

f

and the product of its marginals, we first show that there is no valid test of independence that is uniformly consistent against alternatives of the form

\{f: D(f) \geq \rho^2 \}

. We therefore restrict attention to alternatives that impose additional Sobolev-type smoothness constraints, and define a permutation test based on a basis expansion and a

U

-statistic estimator of

D(f)

that we prove is minimax optimal in terms of its separation rates in many instances. Finally, for the case of a Fourier basis on

[0,1]^2

, we provide an approximation to the power function that offers several additional insights. Our methodology is implemented in the R package USP.Comment: 58 pages, 4 figure

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Discussion of 'Multivariate Fisher's independence test for multivariate dependence'

Author: Berrett Thomas B
Publication venue
Publication date: 04/05/2022
Field of study

Invited discussion for Biometrika of 'Multivariate Fisher's independence test for multivariate dependence' by Gorsky and Ma (2022).Comment: 4 page

arXiv.org e-Print Archive

Efficient functional estimation and the super-oracle phenomenon

Author: Berrett Thomas B
Samworth Richard J
Publication venue: Annals of Statistics
Publication date: 01/01/2023
Field of study

We consider the estimation of two-sample integral functionals, of the type that occur naturally, for example, when the object of interest is a divergence between unknown probability densities. Our first main result is that, in wide generality, a weighted nearest neighbour estimator is efficient, in the sense of achieving the local asymptotic minimax lower bound. Moreover, we also prove a corresponding central limit theorem, which facilitates the construction of asymptotically valid confidence intervals for the functional, having asymptotically minimal width. One interesting consequence of our results is the discovery that, for certain functionals, the worst-case performance of our estimator may improve on that of the natural ‘oracle’ estimator, which itself can be optimal in the related problem where the data consist of the values of the unknown densities at the observations

Warwick Research Archives Portal Repository

Apollo (Cambridge)

Recommended from our members

Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility

Author: Berrett Thomas B
Samworth Richard J
Publication venue: Annals of Statistics
Publication date: 21/09/2023
Field of study

Given a set of incomplete observations, we study the nonparametric problem of testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely the set of alternatives that can be distinguished from the MCAR null hypothesis. This reveals interesting and novel links to the theory of Fr\'echet classes (in particular, compatible distributions) and linear programming, that allow us to propose MCAR tests that are consistent against all detectable alternatives. We define an incompatibility index as a natural measure of ease of detectability, establish its key properties, and show how it can be computed exactly in some cases and bounded in others. Moreover, we prove that our tests can attain the minimax separation rate according to this measure, up to logarithmic factors. Our methodology does not require any complete cases to be effective, and is available in the R package MCARtest

Apollo (Cambridge)